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Abstract —In this paper, we propose a new algorithm for recov¬ 
ery of low-rank matrices from compressed linear measurements. 
The underlying idea of this algorithm is to closely approximate 
the rank function with a smooth function of singular values, and 
then minimize the resulting approximation subject to the linear 
constraints. The accuracy of the approximation is controlled via 
a scaling parameter < 5 , where a smaller 8 corresponds to a more 
accurate fitting. The consequent optimization problem for any 
finite 5 is nonconvex. Therefore, in order to decrease the risk of 
ending up in local minima, a series of optimizations is performed, 
starting with optimizing a rough approximation (a large 5) 
and followed by successively optimizing finer approximations of 
the rank with smaller <5’s. To solve the optimization problem 
for any 8 > 0, it is converted to a new program in which 
the cost is a function of two auxiliary positive semidefinete 
variables. The paper shows that this new program is concave 
and applies a majorize-minimize technique to solve it which, 
in turn, leads to a few convex optimization iterations. This 
optimization scheme is also equivalent to a reweighted Nuclear 
Norm Minimization (NNM), where weighting update depends 
on the used approximating function. For any <5 > 0 , we derive 
a necessary and sufficient condition for the exact recovery 
which are weaker than those corresponding to NNM. On the 
numerical side, the proposed algorithm is compared to NNM 
and a reweighted NNM in solving affine rank minimization 
and matrix completion problems showing its considerable and 
consistent superiority in terms of success rate, especially, when 
the number of measurements decreases toward the lower-bound 
for the unique representation. 

Index Terms —Affine Rank Minimization (ARM), Matrix Com¬ 
pletion (MC), Nuclear Norm Minimization (NNM), Rank Ap¬ 
proximation, Null-Space Property (NSP). 


I. Introduction 

R ECOVERY of low-rank matrices from underdetermined 
linear measurements, generalization of the recovery of 
sparse vectors from incomplete measurements, has become 
a topic of high interest within the past few years in signal 
processing, control theory, and mathematics. This problem 
has many applications in various areas of engineering. For 
example, collaborative filtering Q]|, ultrasonic tomography J2]|, 
direction-of-arrival estimation 0, and machine learning 0 
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are some of these applications. For more comprehensive lists 
of applications, we refer the reader to 0, 

Mathematically speaking, the rank minimization (RM) prob¬ 
lem under affine equality constraints (linear measurements), 
which we refer to as ARM, is described by 

minrank(X) subject to -4(X) = b, (1) 

in which X £ R niXri2 j s optimization variable, A : 
]gmixn 2 jgirn j s a q near measurement operator, and b £ R m 
is the vector of available measurements. The constraints are 
underdetermined meaning that m < ri \ n 2 or more often 
m <C niri 2 . The above formulation has the so-called matrix 
completion (MC) problem as an important instant correspond¬ 
ing to 

minrank(X) subject to [X]jj = [M] i:) -, V( i,j) £ SI, (2) 

where M £ R" lX " 2 is the matrix whose elements are partially 
known, S2 C {l,2,...,ni} x {1,2, ..., 712 } is the set of the 
indexes of known entries of M, and [X], ? designates the 
(z, j)th entry of X. When rank(X*) is sufficiently low and 
A has some favorable properties, X* is a unique solution to 

0 0,0- 

Nevertheless, 0 is in general NP-hard and very challenging 
to solve 0. A well-known replacement is nuclear norm 
minimization (NNM) approach pj formulated as 

min||X||* subject to M(X) = b, (3) 

where j|X||* denotes the nuclear norm of X equal to the sum 
of singular values of X. It has been shown that, under more 
restrictive assumptions on the rank of X* or properties of A, 
(J7} and 0 share the same unique solution X* [5J. 

When measurements are contaminated by additive noise, 
one way to robustly find a solution, is to update 0 to 

minrank(X) subject to ||^4(X) — b|| 2 < e, (4) 

where || • H 2 denotes the 1 2 norm and e is some constant not 
less than noise power. Accordingly, 0 is also converted to 

min ||X||* subject to ||A(X) — b|| 2 < e. (5) 

Again, under some mild conditions on rank(X*) and proper¬ 
ties of A, the solution of 0 is close to the solution of 0 in 
terms of their distance measured by the Frobenius norm 0- 
There are some other approaches to solve the ARM prob¬ 
lem. Some of them are efficient implementations of NNM such 
as FPCA |1§, APG (IT), and SVT Q2). Some others are based 
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on generalization of the methods already proposed for sparse 
recovery in the framework of compressive sampling (CS) 
like ADMiRA [141 and SRF [i! 15) which extend CoSaMP 
and SLO GZ) to the matrix case, respectively. 

Despite the convexity of the NNM program, there is a 
large gap between the sufficient conditions for the exact and 
robust recovery of low-rank matrices using G and G (18). 
To narrow this gap, we introduce a novel algorithm based on 
successive and iterative minimization of a series of nonconvex 
replacements for G- Although our theoretical analysis shows 
that global minimization of each replacement in the series 
recovers solutions at least as good as NNM approach does, 
our numerical simulations demonstrate that the proposed chain 
of minimizations results in considerable reduction in the 
number of samples required to recover low-rank matrices. This 
improvement is achieved at the cost of higher computational 
complexity. Nevertheless, in some applications of MC and 
ARM, like magnetic resonance imaging (19), ©, quantum 
state tomography and system identification and low-order 
realization of linear systems (5), reduction in the number of 
samples can be very beneficial, whereas complexity is not a 
big concern. 

We improve over the method of SRF in |15:|, (22) which 
uses a class of nonconvex functions to approximate the rank 
function and iteratively minimizes the resulting approximation. 
In 1151, the nonconvex cost function scales with a parameter S 
which reflects the accuracy. The smaller 5, the more accurate 
approximation of the rank. SRF starts with a large S and 
decreases it gradually to gain more accurate approximations 
of G and successively optimizes the series of approximations. 
Numerical simulations show superiority of SRF to NNM 
and some other sate-of-the-art algorithms in both MC and 
ARM problems (T5); however, since the collection of exploited 
functions lack the subadditivity property, there is no guarantee 
that globally minimizing the proposed replacement of (T) for 
any 5 > 0 leads to the exact recovery of the minimum-rank 
solution except for the asymptotic case of S —> 0. 

In this paper, we use a class of subadditive approximating 
functions instead. As a result, a necessary and sufficient 
condition for the exact recovery is derived for any S > 0 
which is weaker than that of NNM. In addition, we show 
that, under the same conditions, all matrices of rank equal 
or higher than what is guaranteed by G can be uniquely 
recovered by globally minimizing the cost function for any 
nonzero S. Another interesting result shows that as 6 —> oc, 
the proposed optimization coincides with NNM. 

To solve the resulting optimization problems, similar to 
(23), we convert them to other programs in which the domain 
of the approximating functions is limited to the cone of 
Positive SemiDefinite (PSD) matrices. In this fashion, the rank 
approximating functions are concave and differentiable, so we 
use a Majorize-Minimize (MM) technique consisted of a few 
SemiDefinite Programs (SDP) to optimize them. Hence, we 
term our method ICRA standing for Iterative Concave Rank 
Approximation. It is further shown that the employed MM 
approach finds at least a local minimum of the original concave 
program. 

The rest of this paper is organized as follows. After pre¬ 


senting the notations used throughout the paper, in Section 
|IH the main idea and details of the proposed algorithm are 
described. Section[In]gives some theoretical guarantees for the 
ICRA method as well as a theorem proving the convergence of 
the exploited optimization scheme. In Section |IV| the proofs 
of theorems and lemmas are presented. In Section [V] some 
empirical results from the ICRA method are presented, and it 
is compared against SRF (15), NNM, and reweighted NNM 
(23) . Section VI concludes the paper. 

Notations : For any X £ R" lXn2 , n = min(ni,n 2 ), 
(ji (X) denotes the z'th largest singular value, er(X) = 
(f7i(X),...,er n (X)) T , and ||X||* = is the nu¬ 

clear norm. Besides, it is always assumed that singular values 
of matrices are sorted in descending order. vec(X) denotes the 
vector in R" * 1 ” 2 with the columns of X stacked on top of one 
another. S" and §" are used to denote the sets of symmetric 
and positive semidefinite nxn real matrices, respectively. For 
any Y £ § n , A;(Y) designates the z'th largest eigenvalue in 
magnitude, A(Y) = A^(Y) = (Ai(Y),..., A„(Y)) T is the 
vector of eigenvalues of Y, and trace(Y) = ^"_iAj(Y). 
Also, A^(Y) = (A„(Y),..., Ai(Y)) t denotes the vector 
of eigenvalues of Y in ascending order. For Y, Z £ § n , 
Y A Z and Y >- Z means Y — Z is positive semidefinite 
and positive definite, respectively. Let (X, Y) = trace(X T Y) 
and (x, y) = x T y be the inner products on matrix and 
vector spaces, respectively. As a result, ||Xj|p, = (X, X)$ = 

1 af(X.) denotes the Frobenius norm, and ||x ||2 — 
(x,x) 2 stands for the Euclidean norm. Moreover, HxH^ = 
maxi \xi\ designates the maximum norm, |~x] denotes the 
smallest integer greater than or equal to x. I„ is the identity 
matrix of order n. For a linear operator A : R™ lX " 2 —>• HR 1 , 
let A f(A) — {X £ R niX " 2 |4(X) = 0}. 


II. The ICRA Algorithm 


A. Introduction 
Let 


u(x) 


1 if x > 0, 
0 if x = 0. 


denote the unit step function for x > 0 so that the rank of a 
matrix X equals to GT=i u( 0 i(X)). As u(x) is discontinuous 
and nondifferentiable, direct minimization of rank is very hard, 
and all available exact optimizers have doubly exponential 
complexity |8). Consequently, one approach to solve G is to 
approximate the unit step function with a suitable one-variable 
function f(x) and minimize F(X) = G"=i/( cr *( X )) as an 
approximation of the rank function. Herein, for the sake of 
brevity, we refer to the one- and matrix-variable functions 
f(x) and F(X) as unit step approximating (UA) and rank 
approximating (RA) functions, respectively. 

Implicitly or explicitly, different one-variable functions have 
been used to approximate u(x) in some of the existing 
rank minimization methods. Figure [l] illustrates some of the 
available options for approximating the unit step as well 
as one of the functions used in this work. In this plot, 
f(x) = x has the worst fitting, though, it leads to nuclear 
norm minimization, which is the tightest convex relaxation 
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Fig. 1. It is known that rank(X) = tt(o*(X)). Therefore, approxi¬ 

mation of the rank function can be converted to the problem of approximating 
u(x). Different functions used in the literature of rank minimization to 
approximate the unit step and some of them are plotted in this figure. Among 
them, f(x) = 1 — e~ x ' s closely matches u(x). 


of (T) |5j. f(x) = x p ,0 < p < 1, which is closer to 
u(x) yields Schatten-p quasi-norm minimization [24]. In |24], 
theoretically, it is shown that finding the global solution of 
constrained Schatten-p quasi-norm minimization outperforms 
NNM. Moreover, experimental observations show superiority 
of this method to NNM [251, (26) . f{x) = log^+a), in which 
a is some small constant to ensure positivity of the argument 
of log(-), also, results in better performance in recovering low- 
rank matrices in numerical simulations [23 j[. 

Having the above theoretical and experimental results in 
mind, we expect that finer approximations will give rise to 
higher performance in recovery of low-rank matrices. Ac¬ 
cordingly, we propose using other UA functions like f(x) = 
1 — e~ x ! & that closely match u{x) for small values of <5. 
Obviously, f(x) = 1 — e~ x ^ is the best approximation 
among the functions depicted in Figure [I] in the sense that 
fo° |/(f) — u(t)\ 2 dt = 5/2, for every 5 > 0, is finite. 
Furthermore, by this choice, one can control the merit of the 
approximation by adjusting the parameter 5. 

B. The main idea 

Let Fg(X) = hg{cr(X)) = X)”=i /«( cr *( x )) denote the 
rank approximating function. We replace the original ARM 
problem with 

n 

min (F 4 (X) = ^/«(<r i (X)) s.t. A(X) = b. (6) 

i —1 

When 6 is small, u(x) is well approximated by fg(x). 
However, in this case, Fg(X) has many local minima. In 
contrast, while a larger <5 causes smoother Fg(X) with poor 
approximation quality, Fg(X) has smaller number of local 
minima. In fact, it will be shown in Theorem |T| that when 
5 —> oo, 1 5Fg converts to a convex function. Consequently, 
to decrease the chance of getting trapped in local minima 
while minimizing Fg(X), instead of initially minimizing it 


with a small <5, the ICRA algorithm starts with a large value 
of <5 (<5 —>- oo). Next, the value of 5 is decreased gradually 
and the solution of the previous iteration is used as an initial 
point for minimizing Fg(X) at the current iteration with a 
new 5. Furthermore, we impose the class of functions {fg} 
to be continuous with respect to 5. From this continuity, we 
expect that the minimizers of © for successive iterations, let 
say for 5 = Si and 6 i+1 , are close to each other as 5 decreases 
gradually and Jj+i is in the vicinity of 6 t . Thus, it is more 
likely that a global minimizer of Fg is found. This technique 
which is known as Graduated NonConvexity (GNC) (27) is 
used in G3 to solve the affine rank minimization problem. 


C. Properties of fg(-) 

To efficiently solve ([6]), we are interested in differentiable 
RA functions. The following proposition, which is originally 
from 128 Cor. 2.5], characterizes the gradient of Fg(X) in 
terms of the derivative of fg(-). 

Proposition 1: Assume that F : M" 1 *" 2 —> R is repre¬ 
sented as F(X) = h(cr(X)). Let X = Udiag(er(X))V T 
denote the Singular Value Decomposition (SVD) of X. If h 
is absolutely symmetric^ then the subdifferential of F(X) at 
X is 


dF(X) = {Udiag(0)V T |0 € <%(<x(X))}, 


where dh(cr(X )) denotes the subdifferential of h at <r(X). 

Clearly, under assumptions of Proposition [T| fg(-) must be 
an even function. This requirement as well as other proper¬ 
ties of UA functions cause fg(-) to be nondifferentiable at 
the origin. Therefore, Fg(X) becomes nondifferentiable too. 
This can be seen in another way. Assuming n\ < 11,2 and 
XX T = Udiag(Ai,-- - ,A ni )U 7 denoting the Eigenvalue 
Decomposition (EVD) of XX r , fg(x) — 1 — e~ x / & induces 


Fg(X) = trace(/ ni - e ( xxT ) 1/ '/'5) ; 


in which (XX 7 ) 1 / 2 = U diag(Aj //2 , • • • , An( 2 )U T . This 


reveals that Fg(X) is not differentiable at any non full-rank 
matrix. Nevertheless, if the domain of Fg(-) is restricted to 
the cone of positive semidefinite matrices, we can ignore the 
requirement that fg(-) is symmetric and find concave and 
differentiable approximations for the rank using the following 
propositions]^ 

Proposition 2: Assume that F : §” —t R is represented as 
F( Y) = h( A(Y)) =ho A(Y). If h : > R is symmetric 


and concave, then F(Y) is concave. 

Proof: The proof follows from 129 Cor. 2.7]. ■ 

Proposition 3: Suppose that F : §7 —> R is represented 
as F(Y) = h( A(Y)) = £”=i /(A,(Y)), where Y G §" 
with the EVD Y = Q diag(A(Y))Q T , h : K" -)• K and 


J /i(x) is absolutely symmetric if it is invariant under arbitrary permutations 
and sign changes of the components of x. 

2 Propositions [5] and [ 3 ] can be restated under the milder condition of 
Y G S n . However, as our approximation for symmetric matrices relies on 
the magnitude of eigenvalues, this less restrictive assumption imposes the UA 
function to be even, making it again nondifferentiable at the origin. 
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/ : R — > R is differentiable and concave. Then the gradient 
of F(Y) at Y is 

= Qdiag(0)Q T , (7) 

where 0 = Vft(A(Y)) denotes the gradient of ft at A(Y). 

Proof: In |29, Thm. 3.2], it is shown that if a function ft 
is symmetric and the matrix Y G Sf has A(Y) in the domain 
of ft, then the subdifferential of F is given by 

d(h oA(Y)) = {Qdiag(6>)Q T |0 G 9ft(A(Y))}. (8) 

Since ft(A(Y)) = /(A,(Y)) is differentiable at A(Y), 

9ft(A(X)) is a singleton and consequently 9(ft o A(Y)) 
becomes a singleton. For a convex (concave) function, the 
subdifferential is singleton if and only if the function is 
differentiable |30|. This implies that F( Y) is differentiable 
at Y with the above gradient. ■ 

Proposition [3] relaxes the differentiability conditions of 
Proposition [l]by restricting the domain of Fg(-). However, we 
will show in the following subsection that problem (|6]i can be 
converted to another problem in which the argument of Fg(-) 
is positive semidefinite. Putting all the required properties of 
fs(-) together, we are interested in a certain family of UA 
functions possessing the following property. 

Property 1: Let / : R —> R and define fg(x) = f(x/5) for 
any <5 > 0. The class {fs} is said to possess Property [I] if 

(a) / is real analytic on (tco, oo) for some Xq < 0, 

(b) / is strictly concave for x > C0 and concave on R, 

(c) f(x) = 0 4=> x = 0, 

(d) for x > 0, f(x ) is nondecreasing, 

(e) lim x _j. +00 f(x) = 1. 

It follows immediately from Property [T] that, for x > 0, {fs} 
converges pointwise to the unit step function as S —> 0 + ; i.e.. 


lim f s (x) = 

< 5 —> 0 + 


1 if x > 0, 
0 if x = 0. 


(9) 


In addition to UA function f(x) = 1 — e~ x which is 
mainly used in this paper, there are other functions that satisfy 
conditions of Property [T] For example. 


f(x) 


X 

X + 1 
—oo 


X > x 0 , 

otherwise, 


for some — 1 < Xq < 0. 


D. Optimization of Fg(-) for a specific 8 

The following lemma from |23] shows that the original 
ARM problem is equivalent to 


min rank(Y) + rank(Z) s.t. _4(X) = b. 

(X.Y.Z) 


Y 

X T 



( 10 ) 


where Y G S" 1 and Z G S n2 . 


Tor the most of analysis presented in this paper, concavity of /(•) 
is sufficient, and strict concavity is merely needed to show that the used 
optimization algorithm converges to a local minimum. 


Lemma 1 ( f23\ Lem. 1 ]): Let X G R" lX " 2 (-, e an y arbitrary 
matrix. Then rank(X) < r if and only if there exist matrices 
Y G S" 1 and Z G S" 2 such that 


rank(Y) + rank(Z) < 2r, 


Y 

X T 


X 

z 


A 0. 


(-j^r %) t: 0 implies that Y G 0,Z G 0 |31 
if rank(Y) + rank(Z) is approximated by 


Therefore, 


n i n 2 

Fs{ Y) + F S { Z) = g f s { Ai(Y)) + £ f s { A S (Z)), 

2 = 1 2=1 

then, according to Propositions ||and[| F s {Y) and F s { Z) 
have the desirable concavity and differentiability properties. 

As a result, to extend to arbitrary matrices with a 
differentiable and concave RA function, 


min Fs(Y) + F s (Z) 

(X.Y.Z) (n) 

subject to -4(X) = b, ( yf T z ) — Or 

is solved to find a solution to |6|. A similar approach has been 
exploited in [23] to convert ([6]) for fs(x) = log(a; + a) tcQ 

min log(det(Y + al ni )) + log(det(Z + al„ 2 )) 

(X.Y.Z) 

subject to Al(X) = b, 

( 12 ) 

To solve O- we use a Majorize-Minimize (MM) technique 
|32| |. In MM approach, the original cost function is replaced 
with a surrogate function having the following properties. For 
a vector function ft(x) : —► R, iT(x, x) : R" x R" -> R 

is a surrogate function at x if if(x,x) satisfies 


iF(x,x) = ft(x), 

7T(x,x) > ft(x), for all x. 


// (x. x) is also known as tangent-majorant, as the surface 
x i—x i7(x,x) is tangent to the surface ft(x) at x and 
lies above it at other points. The underlying idea of MM 
is to iteratively minimize the surrogate function instead of 
minimizing the original cost function. More precisely, let x/, : 
denote the solution at the kth iteration, then x /i:+1 is obtained 
by minimizing the surrogate function at x/ i: ; that is, 


Xfc-j-i G argmin7T(x,x fc ), 

x£ T 

where T denotes the feasible set of the optimization problem. 
It can be easily shown that ft(xfc+i) < ft(xfe) proving that the 
original cost function is continuously decreasing. Naturally, a 
good choice for a surrogate function is a convex one which can 
be easily optimized. In our problem, since Fg{ Y) is concave, 
the first-order concavity condition implies that 

F S { Y) < F S ( Y) + (Y - Y, VF S { Y)), 

for some Y in the feasible set. As a result, Iff Y. Y;j = 
Fg{ Yfc) + (Y — Yk,VFs{Yk)) is chosen as a surrogate 
function for Fg{ Y). With a tiny abuse of notation, let, likewise, 
Zfc) = Fg( Zfc) + (Z — Zfc, VFj(Z),)) denote the 
surrogate function for Fg{ Z). Applying the MM approach. 


4 For this case, fs{-) does not scale with <5. 
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problem ( fTT| >, for a fixed 5, can be optimized by iteratively 
solving 


(X fc+ i,Y fc+ i,Z* + i) — 

argmin (VFg(Y k ), Y) + (VF,( Z k ), Z) 

(X.Y,Z) (13) 

subject to -4(X) = b, \^ 0, 


until convergence. It is easy to verify that the above program 
is SDP, and it is shown in Section |III-C that it converges to a 
local minimum of GD- 


E. Initialization 

As pointed out earlier, in the GNC procedure, we initially 
solve © or <UD for S tending to oo. In this case, as shown in 
the following theorem, whose proof is given in Section |IV-A[ 
© and © can be optimized by solving ([3]). 

Theorem 1: For any class of functions {fg} possessing 
Property [I] and any X G ]R raiX ” 2 , Y G S™ 1 , Z G §" 2 , 

lim -Fg(X) = ||X||*, 

o —^ OO 'Y 
c 

lim —(Fg( Y) + Fg(Z)) = trace(Y) + trace(Z), 

S-¥ oo 7 ' 

where 7 = /' (0) 7 ^ 0. Furthermore, 


lim argmin{Fj 5 (X)|yI(X) = b} 
S—>00 x 


= argmin{||X||*|*4(X) = b}, 


provided that NNM has a unique solution. 

A solution to 0 can be obtained by optimizing the follow¬ 
ing equivalent problem |5| 

min trace(Y) + trace(Z) 

(X.Y.Z) (14) 

subject to „4(X) = b, (J T ^0- 
Accordingly, Xq,Yq,Zo are initialized by solving (fl4|. 


F. The final algorithm 

Applying all the introduced stages of the algorithm to the 
UA function fg(x) = 1 — e~ x ^ & , the ICRA algorithm is 
summarized in Figure [2] In addition, the following remarks 
give complementary comments about implementation details 
by describing algorithm parameters and their selection rules. 

Remark 1. As depicted in Figure [2j A is updated as 
Si = cSi -1 for i > 1. We will examine how to choose a 
suitable decreasing factor c in Section [V] in more details, yet 
c G (0.1,0.5) is a good choice in general. Furthermore, is 
set to 807 (Xo) because it is easy to verify that 1 — e~ rTi ^ x -°' > / s ° 
is closely approximated by t7j(Xo)/<Jo with this choice of <5o. 
Hence, this So acts as if it tends to 00 . 

Remark 2. di = ||X i+ i-X i ||p’/||X i ||. F and d 2 = ||Xj+i- 
XjIIWIIx.Hf, as measures of distances between results of 
successive iterations, are used to stop execution of the external 
and internal loops, respectively. Moreover, and e 2 are 
usually set to 10 ~ 2 to settle down X,+i and X }+1 to vicinity 
of 1% distance of the previous solutions X, and X 7 . 


Input: A(-), b, fg{-) 

Initialization: 

1 : X 0 = argmin x { ||X||* |.A(X) = b}. 

2 : <5o = 8a 1 (X 0 ). 

3: c: decreasing factor for S. 

4: ei, e 2 : stopping thresholds for main and internal loops. 
Body: 

1: i = 0, 5 = Si. 

2: while d\ > ei do 
3: j = 0, X 0 = Xj. 

4: while d 2 > e 2 do 

5 ' (Xj+i, Yj+i, Zj + i) = 

argmin (V^(Y 3 ),Y) + <VF*( Z,), Z) 

(X.Y.Z) 

subject to ,4(X) = b, \) h 0. 

6 : d 2 = ||X i+ 1 -X j || F /||X i || F . 

7: j =j + 1- 

8: end while 

9 . X ; +1 = X, 

10: di = ||Xf+i — X i ||i?/||X i ||ir. 

li: i = i + 1,5 = cS. 

12 : end while 

Output: X, 

Fig. 2. The ICRA Algorithm. 


Remark 3. For fg(x) = 1 — e x / s , the gradient of Fg(Yj) 
and Fg(Zj) are given by 

F s (%) = ^ Pdiag(e- Al( ^ )/5 ,--- ,e- A "i ( ^)/' s )P T , 

F s (%) = * Qdiag(e- Al ^)/ 5 ,--- ,e" A " 2 ^/ 5 )Q T , 

where Pdiag(A(Y J -))P i and Q diag(A(Zj))Q J denote the 
EVD of Y j and Z ( , respectively. 

Remark 4. Following the same argument as in [ [33] , problem 
( fl3| l can be cast as a re weighted nuclear norm minimization; 
i.e.. 


X fc+1 = argmin||W[.XW[.[I* s.t. -4(X) = b. 

If USV T denotes the SVD of W( : X ; ,. H W(, then weighting 
matrices as well as Yfc +1 , Z k+i are updated by 

Yfc+i = (W(,) _ 1 UEU t (W[.) _1 , 

z k+1 = (W£)- 1 VSV T (WI.)- 1 , 

1 1 

w ' +1 = (vF 5 (Y fc+ 1 )) 3 ,WI _ +1 = (vF 5 (Z fc+1 )) 5 . 

There are efficient solvers for the NNM like FPCA [TO] and 
APG(TT). As a result, one can exploit these algorithm to solve 
( p~3| ) more efficiently than SDP. 

Remark 5. ([ 6 ] can be generalized to the following setting 
for taking into account the noise in measurements 

min Fg(X') subject to ||„4(X) — b || 2 < e. (15) 
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Consequently, the following program can be solved instead of 

0 


(Xfc+i, Yfc +1 , Zfc +1 ) — 

argmin (VF s (Y k ), Y) + (VF 4 (Z fc ), Z) 

(X.Y,Z) (16) 

subject to ||4(X) — b|| 2 < e, (^z)- 0 ' 


conditions and RRC differ by a set of measure zero. Ac¬ 
cordingly, recalling the strong parallels between RM and f (l - 
minimization ||5]|, roughly speaking, we expect that under the 
same conditions as in Lemma 0 0 can recover matrices 
close to the solutions of ([4]) in the Frobenius-norm sense. 

Lemma 3: Under the same assumptions on / as in Lemma 
0 if, for some W G Af(A) \ {0}, 


III. Performance Analysis 


In this section, we analyze the performance of the ICRA 
algorithm in recovery of low-rank matrices. First, in Section 
|III-A| a necessary and sufficient condition for exact recovery 
of ([6]) is presented. The sufficient condition is based on null- 
space properties of the measurement operator. Next, exploiting 
results established in in Section |IlI-B| we prove that the 
sequence of minimizers of ([6}, for a decreasing sequence of <5, 
converges to the minimum rank solution. We will not discuss 
the issue of global convergence; instead, it is shown that if the 
MM approach is applied, program ( fj~3j > converges, at least, to 
a local minimizer of GD- 


A. Uniqueness 

One simple way to characterize the conditions under which 
a method can successfully find the exact solution in both sparse 
vector and low-rank matrix recovery from underdetermined 
linear measurements is to use null-space properties of the 
measurement operator. In the vector case, for a general func¬ 
tion inducing a ‘sparsity measure’, a necessary and sufficient 
condition for exact recovery is derived in |34) . Here, we gen¬ 
eralize some results of |34j| to low-rank matrix recovery and 
introduce a necessary and sufficient condition for the success 
of (0). Furthermore, it is shown that global optimization of (0 
uniquely recovers matrices of higher or equal ranks than those 
of uniquely recoverable by NNM. The proof of the following 


lemmas and theorem are given in Section IV-B 


The results of the next two lemmas are valid for not only 
fs(x) = f(x/6) in 0 but also any / : R —> K. which is used 


E/M w )) > E /M w ))> ( 17 > 

i =1 i=r -\-1 


then there exist X and X' such that rank(X) < r,4(X) = 
4(X') and F(X') < F(X). 

The sufficient condition in Lemma [2] can be also described 
by the following inequality 

r n 

2£/M W )) <£/fa(W)). 

1=1 i=1 


As a result, if we define 


d f {r, A) 


A 


sup 

WG^(^)\{ 0 } 


EI=i/MW)) 

e:=i/mw))’ 


the uniqueness can be characterized as: All matrices of rank 
at most r are uniquely recovered by 0 ifO f (r,A) < 1/2. 
In fact. Of extends a similar parameter defined in [ 341 for 
('/-norm minimization. 

Let r*f & (4) denote the maximum rank such that all matrices 
X with rank(X) < r*f s (A) can be uniquely recovered by 
0. In particular, r^. m (A) and r* nm (4) are the corresponding 
values for fs{x) = u(x) and fs{x) = x\ that is, original rank 
minimization problem, 0, and nuclear norm minimization, 
0 . Then we have the following result. 

Theorem 2: For any fs(-) possessing Property 0 


r 


* 

nnm 


{A)<r%(A)<r* ana {A). 


B. Convergence to the rank function 


min (i^X) = £]/(<* (X)) subject to 4(X) = b 

i=l 

to recover a low-rank matrix. 

Lemma 2: Every matrix X £ R raiXn2 of rank at most r can 
be uniquely recovered using 0 for any / possessing Property 

0(b) to 0(d), if, VW e AT (A) \ {0}, 

r n 

E/M w )) < E /M w ))- 

i—l i=r -\-1 

In general, extending Lemma 0 to the noisy rank minimiza¬ 
tion is not straight-forward. In fact, even in the vector case, 
robust recovery conditions (RRC 0 for a sparsity measure have 
been derived only for the t p quasi-norm |35) . Nevertheless, a 
recent work {35) proves that, under some mild assumptions, 
the sets of measurement matrices satisfying exact recovery 

5 The so-called RRC guarantees stable recovery of sparse vectors from noisy 
measurements using minimization of a sparsity measure inducing function. 


The following definition, which like Of{r 1 A) depends on 
the null space of 4, is used to show that when <5 —> 0, the 
solution of 0 tends toward the minimum rank solution of 0. 
In other words, in order to get arbitrarily close to the minimum 
rank solution, it is sufficient to solve 0 for a properly chosen 
6 which depends on the employed UA function. 

Definition 1 (Spherical Section Property / | / 8^ , l\36\J ): The 
linear operator 4 possesses A-spherical section property if, 
for all W e AT(A) \ {0}, ||W||2/||W|||, > A(4). In other 
words, spherical section constant of the linear operator 4 is 
defined as 


A(4) 


A 


l|W|| 

wgjV(4)\{o} IIW|| 


2 

4 

2 ” 

F 


The following proposition is originally from [15 Thm. 4]. 
Although different assumptions were imposed on the UA 
functions in the proof of G3’ the authors merely used 
properties that are common to our assumptions, making the 
result applicable also to our analysis. 
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Proposition 4: Assume A has A-spherical property and 
{fs} possesses Property [T| Let Xo be the unique solution to 
G and let X ,5 denote a solution to (| 6 j. Then 


||X a - XqIIjt < 


nag 


where ag = \ f s 1 (1 — /) |, and, consequently. 


lim Xg = Xo. 

6— > 0 + 

This result is of particular interest since the best result 
available for NNM shows that if rank(X) < A/4, then X 
can be uniquely recovered [ 18j which is more restrictive than 
rank(X) < A/2, a sufficient condition for the uniqueness of 
the solution of Q- However, the above proposition proves that 
we can find accurate estimate of the original solution whether 
it is recoverable by NNM or not. 


C. Convergence Analysis 

The next theorem whose proof is left to Section IV-C proves 
that the MM approach proposed in ([13) to solve (|11) will find 
a local minimizer of <UD- 

Theorem 3: The sequence of {f X/.,.. Y /.. Z/,.)} is convergent 
to a local minimizer of ©• 

IV. Proofs 

A. Proof of Theorem [7] 

Proof: Using the Taylor expansion, /(•) can be formu¬ 
lated as 

/(s) = 7 s + g(s), 

where 7 = /'( 0 ) and 


lim AA = n. 

s->0 S 


(18) 


7 cannot be 0 because the first-order concavity condition 
implies that, for any x > 0 , 

/( x) < /( 0 ) + xf'(0) = 7 ®, 

and 7 = 0 converts the above inequality to f(x) < 0 which 
contradicts Property [T| Now, Fg{-) can be represented as 


F s (X) = J2fs(^(X)) 

2=1 

n 

= J||X| |. + 5>MX)/<s). 

2=1 

© can be reformulated as 


7 


-F S (X) = ||X| 


n 


g(oi(X)/S) 
<Ti(X)/S ' 


(19) 


( 20 ) 


By virtue of ( [18) , it follows that 


lim — Fg(X) = ||X|j*. 

8 —>00 'y 

Following the same line of argument, it can be easily verified 

lim ~(F S ( Y) + F S {Z)) = trace(Y) + trace(Z). 

6—yoo 7 v ' 


To prove the second part, let 

X = argmin{||X||*|7l(X) = b}, 
x 

X 5 = argmin{F 5 (X)|7l(X) = b}. 
x 

From (|20) and the inequality 


n 

Yxm 

2=1 


n n 

^ (ENl)(EI»‘l)> 

2=1 2=1 


we have 


SF s (X) < 


iixiu(7+e 

2=1 


|gMX)/<5)K 

<Ji{x)/5 r 


SF s (X) > 


iixiu( 7 -E 

2=1 


|g(^(X)/(5)K 
cJi{X)/5 J' 


The above inequalities as well as ( [18) imply that Ve > 0, 3Jo, 
such that VJ > <5 q 


7 — e < 


SF s (X) 

l|X||* 


< 7 + e- 


X 5 is a solution to ([ 6 ), so SFg(Xs) < 5F$(X). Furthermore, 
we have ||X||* < ||X 5 ||* since X is the unique solutio of ([3). 
Therefore, for e < 7 , we obtain 


( 7 -e)l|X||*<( 7 -e)||X 5 ||*<JF 5 (X 5 )<JF 5 (X)<( 7 +e)||X||* 


which proves that lim^^oo |[X,s||* = ||X||*. As X is the 
unique solution to 0 (under the same equality constraints), it 
can be concluded that lim^oo X 5 = X. ■ 


B. Proofs of Propositions [2] and [I] and Theorem [2] 

Before proofs, we need the following definition, corollary, 
and lemmas. 

Definition 2 (fj37j): A function $(x) : K n —>• K is called 
symmetric gauge if it is a norm on R" and absolutely 
symmetric. 

Lemma 4 < f3H\ Cor. 2.3]): Let $ be a symmetric gauge 
function and / : [ 0 , 00) —> [ 0 , 00) be a concave function with 
/(0) = 0. Then for A, B € R" lXn2 , 

$(/(<r(A)) - /(<r(B))) < $(/(<r(A - B))), 

where /(x) = (/(ari),..., f(x n )) T . 

Lemma 5: For any function possessing Property [I] f{x)/x 
is nonincreasing for x > 0 . 

Proof: Let g(x) = f{x)/x. It is sufficient to show that 
g'(x) = (xf'(x) — fix)')/x 2 is nonpositive for x > 0 . f(x) 
is concave, so we can write 

/( 0 ) < f(x) + (0-x)f(x) 

for any x > 0 which proves that g'(x) < 0 . ■ 

Corollary 1: Let A,B £ R niX ™ 2 . For any / possessing 
Property [Tj-fb) to [TJ-fd), 

n n 

E/MA-B)) >^|/(a i (A))-/(a i (B))|. (21) 

2=1 2=1 














Proof: $(x) = £” = i \xi\ and /(•) satisfy conditions of 
Lemma [4} thus, © immediately follows. ■ 

Proof of Lemma [2j The proof is similar to |[7] Lem. 6 ] 
and extends uniqueness condition from NNM to a larger class 
of functions possessing Property [T](b) to |T[(d). Assuming 
_4(X) = b, all feasible solutions to ([ 6 ]i can be formulated 
as X + W for some W £ Af(A). To show that X is a unique 
solution to <[ 6 ]», it is sufficient to prove that, VW £ A/”(Al)\{0}, 
F(X +W) > PCX). Starting from Corollary [I] we can write 
that 

n 

F(X + W)=^/( CTi (X + W)) 

i=1 
n 

>^|/( f 7 i (X))-/(a i (W))|j 

i= 1 

r n 

= ^|/( f r i (X))-/(a i (W))|+^/(a i (W)) 

i—1 i=r -\-1 

r n 

>^/(a i (X))-/(a i (W))+^/( t r. i (W)) 

i—1 i=r -\-1 

r 

>£/MX)) = f(x), 

i=l 

which completes the proof. ■ 

Proof of Lemma [ij Let 

W = U diag(cr 1 ,..., cr„)V T 

denote the SVD of W. Choose 

X = — Udiag(cri,..., ay, 0,..., 0) V T 
X' = Udiag(0,..., 0, oy+i, ..., a n )V T . 

Obviously, W = X -X, A(X) = A(X'), and rank(X) < r. 
On the other hand, ( fT7) > implies that 

n r 

F(x r ) = y /m w)) < y /m w )) = 

i=r-\-1 i—1 

■ 

Proof of Theorem |2j Lemma [5] implies that, for x > 0, 
fs(x)/x is nonincreasing. Hence, following a similar argument 
as in [34] Thm. 5], one can easily verify that, for any W f 0, 
SLi cr i( W )/Si=i/^(^(W)) is a nonincreasing sequence 
in r. Consequently, 

EIUMW) EU^(W) 

ElLi/sMW)) - EEi/sMW))’ 

or, 

ELi < EEi^(w) 

- E ?=1 *i(w)’ 

which shows 9f s (r,A) < 9 nnm (r,A ) for any r < n. 
9f 6 (r,A) 1 9 nnm (r,A) are increasing in r, so it can be con¬ 
cluded that r^ s (A) > r* nm (A). Similarly, it can be shown 
that EEi u(<Ti(W))/ E[=i fs(<Ji(W)) is a nondecreasing 
sequence, and 

EU^(w)) EU-ftMW)) 
Er=i«(^(w)) - Eti fs&mr 

confirming that r* lm (A) > r* fs (A). ■ 


C. Proof of Theorem [j] 

We start with the following lemmas. The first lemma is 
originally from f39] Lem. II. 1]. 

Lemma 6 ( 7j?9| Lem. 11.1 ]): Let A, B £ 8 "; then 

n n 

Y A„- i+1 (A)A 4 (B) < trace(AB) < ^ A i (A)A I (B). 

i—1 i—1 

Lemma 7: Assume that F : 8 " —> K. is represented as 
F(X) = h( A(X)) = E”=r /(A(X)) in which / : R. —> R. If 
/(•) is twice differentiable and strictly concave, then F(X) is 
strictly concave, and there is some m > 0 such that, for any 
bounded X,Y £S“, X/ Y, 

F( Y) - F(X) < (Y - X, VF(X)> - ™ || Y - X|| 2 F . (22) 

Proof: First, it is shown that F(-) is strictly concave, then 
( |22| > follows as a result. To this end, notice that strict concavity 
of /(•) implies that h(-) is strictly concave too. From the first- 
order concavity condition, it is known that h is strictly concave 
if and only if, for any x y, 

Ky) < /i(x) + (y - x, V/i(x)). 

Propositions |2] and [5] together imply that F(-) is differentiable. 
Thus, substituting x,y with A^(X), A'*'(Y) in the above 
inequality gives 

F(Y) < F(X) + (A^(Y) - A ; (X), VL(A ; (X))). (23) 

Let X = U diag(A^(X))U T denote the EVD of X. Applying 
Proposition [5] on F(-) yields 

VF(X) = Udiaga^A^X)), • • • , /'(A n (X)))U T 
= Udiag(V/r(A ; (X)))U T . 

Therefore, 

(X, VF(X)) = trace(diag(A ; (X)) diag(V/i(A^(X)))) 

= (A^(X), V/r(A^(X))). (24) 

Also, 

<Y,VF(X)) = trace(YVF(X)) > (A ; (Y), A r (VF(X))), 

where (a) follows from Lemma [ 6 j Since /(•) is strictly 
concave, /'(•) is decreasing and /'(A,(X)) > /'(Aj(X)) for 
i > j. Therefore, A^(VF(X)) = V/i(A^(X)), and the above 
inequality becomes 

(Y,VF(X))>(A J -(Y),V/ l (A i (X)). (25) 

Substituting ( |24| and ( [25] ) in ( |23j ), we obtain 

F(Y)<F(X) + (Y-X,VF(X)), (26) 

which shows that F(-) is strictly concave. 

The Hessian of ft (x) is given by 

V 2 /i(x) = diag(/"(a;i), • • • , f"(x n )). 

As /(•) is strictly concave, for any bounded U > 0, there 
is a m! > 0 such that f"(x) < —ml for any |a;| < U, and 
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it follows that V 2 /i(x) A —m'l for all x with ||x||oo < U. 
Further, assuming HxH^, ||y||oo < U, we have 

My) = Mx) + (y - x, v/i(x)) + * (y - x) T v 2 /t(z)(y - x) 

for some z in the line segment connecting x and y. Using 
V 2 /i(z) A —m'l, we get 

My) < M x ) + (y - x, V/i(x)) - ^Hl y — x lli- 

Similarly, for the function i 7 ’(-) which is strictly concave, there 
is some m > 0 such that for any bounded X, Y G § n , X/Y, 

F(Y) - F(X) < (Y - X, VF(X)) - j ||Y - X|||, 

which completes the proof. ■ 

Proof of Theorem [?]• First, we show that the sequence 
{(Xfc, Yfc, Zfc)} is bounded and convergent. Since Fg( Y) and 
Fg( Z) are concave, we can write that, for every Y G S " 1 , Z G 

S+ 2 , 

F 4 (Y) < F 4 (Y fc ) + (Y — Y k , VFg(Y k )) = Hg(Y,Y k ), 
Fg( Z)< Fg(Z k ) + (Z — Zfc, VFg(Z k )) =H S (Z,Z k ). 


In the MM step, the next point is updated by 

(Xfc+i, Y fc+1 ? Zfc+i 

where 


argmin Hg(Y,Y k ) + Hg(Z, Z k ) 

— (X,Y,Z) 

subject to (X, Y, Z) G J 7 , 


_F = {(X,Y,Z)|A(X)=b, 


Y X 
X T z 


^ 0 } 


denotes the feasible set. Clearly, 

H s (Y k+1 ,Y k )+H s (Z k+1 ,Z k )<H s (Y,Y k ) + H 5 (Z,Z k ). 

Therefore, for all k. 


Fg(Y k +i) + Fg( Z fe+ i) < Hg(Y k+1 ,Y k ) + Hg(Z k+1 ,Z k ) 
< Hg(Y k ,Y k ) + H s (Z k ,Z k ) 

= Fg{Y k ) + F s (Z k ). (27) 

From 0 and Fg(Y k ), Fg(Z k ) > 0, we can conclude that 
the sequence {Fg(Y k ) + Fg(Z k )} is convergent. Assume that 
( fl3] > is initialized with (Xo, Yo). We have 

Fg(Y k ) + Fg( Z fc ) < F 5 ( Y 0 ) + F S ( Z 0 ), Vfc > 1, 

showing {Y/,.} and {Z/.} are bounded. Moreover, from the 
constraints 



0 Lem. 3.5.12] implies that there is a matrix C with 
cti(C) < 1 such that Xfc = YjjCZj) proving that {Xfc} 
is also bounded. To show that these sequences are convergent 
too, we start by applying Lemma [7] on Fg( Y) and Fg( Z) to 
get 

y||Y fc+1 -Y fc |||. < Fg(Y k )-F s (Y k+1 ) 

+ (Y/c+i — Y,.,VF 5 (Y fc )), (28) 
-^-||Zfe+i — Z fc ||| < Fg(Z k ) — Fg(Z k+ i) 

+ (Z k+1 ~Z k ,VF s (Z k )). (29) 


From ( [13) , we have 

(Xfc+i, Yfc+i, Zfc+i) = 

argmin (VF { (Y fc ), Y) + (VF«(Zfc), Z) 

(X.Y.Z) 

subject to (X, Y,Z) G T, 

As a consequence, 

(VFj(Yn), Yfe+i) + (VF 5 (Z fc ),Z fc+1 ) < (VF s (Y k ),Y k ) 

+ (VFg(Z k ), Z fc ), 


or. 


VF«(Yfc)) + (Z fc+1 - Zfc, VFa(Zfc)) < 0. 

Combining ( |28| > and ( [29] ) and knowing that (Yfc + i — 
Y k ,\7Fg(Y k )) + (Zfc+i - Z k ,YF s (Z k )) is nonpositive, it 
can be obtained that 

|||Yfc +1 -Yfc||^ + ™||Zfc +1 -Zfc,|| 2 F 

< F s (Y k ) — F s (Y k+1 ) + F s ( Zfc) - Fj(Zfc +1 ), 


and, consequently, 

y||Y fc+1 -Yfc||J. < Fg(Y k ) — Fg(Y k+ i) 

+ F s (Z k )-F s {Z k+1 ), (30) 
^||Zfc +1 -Zfc.|| 2 F < F s (Y k ) - F s (Y k+1 ) 

+ Fg(Z k )-Fg{Z k+1 ). (31) 

Summing over k , it follows from p0| ) and pi) that 

N 

j H Y fc+i ^ ^ F *( Y o) + Fg( Z 0 ), 

/c —0 
N 

^ 0 ||Z fc+1 - Zfc|| F < Fg(Y a ) + Fg{ Z 0 ). 

^ /c —0 

This shows that ™ X)itLo II Y fc+i “ Y fc ||| and 
f EtollZfc+i — Zfc HI converge when N —► oo, which, 
in turn, proves that {Yfc} and {Zfc} are convergent. 
Following the same line of argument made to show that 
{Xfc} is bounded, convergence of {Xfc} follows from 
(x^ z^) — ® and convergence of {Yfc}, {Zfc}. To show 
that {(Xfc, Yfc, Zfc)} converges to a local minimum of 0. 
we cast ( fT3) > as a standard SDP. First, note that 

A(X) = b <=> (Aj, X) = bi, i = 1, • • • , m, 


for some A; G R” lX ” 2 . By introducing 




VF 4 ( Yfc) 

0 




0 

VF a ( Zfc) 


0 converts to 

min trace(CW) 

w 

subject to trace}A'W) = bi, i = 1 , • • • ,m, (32) 

w y o. 
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Let {Xfc, Y k, Z^} —>• {X*, Y*, Z*} as k —► oo. The Karush- 
Kuhn-Tucker (KKT) conditions for ([32} ]40) implies that. 


3y* G K m ,S 
• £™iJ4*A' + S* = 


G S ni+ " 2 such that 
VF 4 ( Y*) 


0 


0 

VF S (Z* 


• trace(A'W) = b is * = !,-■ 


, TO, 


. s*w* = 0, 

. S*hO,W*3 0, 


where W* 



It can be easily verified that 


the above conditions are the KKT conditions for the original 
problem 


min F S (Y)+F S (Z) 

subject to trace(A'W) = bi, i = 1, ■ ■ ■ , to, 

W AO, 

which together with ( |27| > and concavity of the cost function 
confirms that (X*, Y*, Z*) is a local minimizer of (JTTJ. ■ 



Fig. 3. Averaged SNRrec’s of the ICR A algorithm in solving ARM and 
MC problems are plotted versus c for 6 different number of measurements. 
Matrix dimensions are set to 30 x 30, and r is fixed to 6 in all simulations. To 
obtain accurate estimates of the SNRrec, in each problem, 100 Monte-Carlo 
simulations are run, and results are averaged. 


V. Numerical Experiments 

In this section, we present a numerical evaluation of the 
performance of the ICRA algorithm. First, the effect of pa¬ 
rameter c in the accuracy of recovering low-rank matrices is 
analyzed. Next, after proposing a suitable choice for c, the 
evolution of the phase transitions of the ICRA algorithm in 
solving MC and ARM problems when 6 decreases from oo 
is illustrated. Finally, superiority of the proposed algorithm 
in MC and ARM problems is demonstrated via simulations. 
Toward this end, ICRA is compared to NNM, the method of 
|23 |, and SRF which already outperforms some of the state- 
of-the-art algorithms in the MC problem G3- As mentioned 
earlier, in |23 , Fazel et al proposed to replace (|T} with ( fl 2 | ). 
To solve (12 1 , they proposed to use a Majorize-Minimize 
technique which leads to solving the following SDP iteratively, 

(X fc+ i,Y fc+ i,Z fc+ i) = 

argmin trace((Y fc + aI„ 1 ) _1 Y) + trace((Z fc + aI„ 2 ) _1 Z) 

(X.Y.Z) 

s.t. (z ) — °> A(X) = b. 

Although appears to constitute an instance of ( fTT) , for this 
replacement of the ARM, /( x) = log(x + a) does not satisfy 
some of the requirements in Property [T] This algorithm is 
referred as LGD (LoG-Determinant) in the sequel. 

We use random matrices as solutions to Q and 0 and 
random linear operators in our simulations. In particular, to 
generate a random matrix X G K ™ 1 x " 2 of rank r, X. 1 G R ni xr 
and X r G R rxni , whose entries are identically and inde¬ 
pendently distributed (iid) from a zero-mean, unit-variance 
Gaussian distribution N( 0,1), are generated. Then we set 
X = X z X r . The constraints _4(X) = b are converted 
to Avec(X) = b, where A G R mxn i «2 ; s t he matrix 
representation of A, and every elements of A is iid from 
A'fO, lj. Furthermore, in MC scenarios, revealed entries are 
selected uniformly at random from all the elements of X. 
Let X designate the output of one of the above algorithms 
to recover X. For measuring the accuracy of reconstruction, 


we define SNR rec = 201og 10 (||X|| F /||X-X|| F ) in dB as the 
reconstruction SNR. Furthermore, d r = r(n\+ri 2 — r) denotes 
the number of degrees of freedom for a real-valued matrix of 
dimensions n\ x ri 2 with rank r ©• 

In all simulations, square matrices are considered, and 
ni = ri 2 = n is always set to 30. Moreover, always, ei and 
62 are fixed to 10 " 2 to stop both internal and external loops 
when the current solution changes only 1 % from the previous 
one. f(x) = 1 — e~ x is the UA function in all the following 
experiments. All simulations are performed in MATLAB 7.14 
environment, and CVX J4l} is used to solve m 

Experiment 1. The parameter c is used to control the decay 
rate of 5 in refining the rank approximation. More specifically, 
at the ith iteration of the external loop. Si is set to cSi-\. 
The optimal choice of c is a function of the aspects of the 
problem under consideration. However, roughly speaking, as 
the number of measurements decreases toward the degrees of 
freedom of the solution, larger values of c should be chosen. 
In contrast, in problems with larger ratio of the number of 
measurements to the degrees of freedom, smaller values of c 
lead to less number of iterations while SNR rec does not degrade 
considerably. 

In this experiment, the above rule is numerically verified. 
For 30 x 30 randomly generated matrices of rank 6 , six differ¬ 
ent ARM and MC problems are solved to cover cases where 
m/d r is small or large, c is changed from 0.1 to 0.5. Trials 
are repeated, for each value of to, 100 times, and SNR rec ’s 
are averaged over these trials. Figure [3] shows SNR rec as a 
function of c. Clearly, when there is sufficiently large number 
of measurements, SNR lec is approximately independent of c. 
Thus, since increasing c gives rise to more number of itera¬ 
tions, it should be chosen as small as possible. On the other 
hand, for smaller number of measurements, reconstruction 
SNR depends on c. However, after passing a critical value, 
SNR rec remains approximately unchanged. Therefor, to have 
the lowest computational complexity, c should be selected a 
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Fig. 4. Phase transition plots for the ICRA algorithm in solving the 
MC problem as it proceeds with finer approximations of the rank function, 
(a) corresponds to the NNM which is used to initialize ICRA, and (b)- 
(d) correspond to the next three consecutive iterations of the external loop. 
Simulations are performed 50 times. Gray-scale color of each cell indicates 
the rate of perfect recovery. White denotes 100% recovery rate, and black 
denotes 0% recovery rate. A recovery is perfect if the SNR re c is greater than 
60 dB. 

bit above that critical value. Applying the above rule, in the 
rest of experiments, c is chosen to be 0.2. 

Experiment 2. This experiment is devoted to analyze the 
performance of the proposed algorithms as it proceeds with 
finer approximations of the rank function. To that end, the 
phase transition graph, which similar to the CS framework 
indicates the region of perfect recovery and failure in solving 
rank minimization problems 0,0, is utilized. To empirically 
generate the phase transition graphs, r is changed from 1 to 
n, and, for a fixed r, m is swept from d r to n 2 . For every 
pair (r, m), 50 random realizations of X are generated and 
empirical recovery rates according to the solutions obtained in 
the initialization step and the next three consecutive iterations 
of the external loop are calculated. This procedure is run for 
both ARM and MC settings, and a solution is declared to be 
recovered if reconstruction SNR is greater than 60 dB. 

Figures[4]and[5]show the results of this experiment for ARM 
and MC problems. The gray color of each cell indicates the 
empirical recovery rate. White denotes perfect recovery in all 
trials, and black shows unsuccessful recovery for all trials. As 
clearly illustrated in these plots, when S decreases the region 
of perfect recovery extends. Particularly, at two first iterations, 
the gain in the extension is more significant. Furthermore, our 
experiments shows that decreasing 6 for more than four steps 
does not boost the performance meaningfully. 

Experiment 3. In this experiment, the ICRA algorithm is 
compared to NNM, LGD, and SRF methods in solving ARM 
and MC problems defined in (|T]» and 0. respectively. Two 
criteria are used to this end: success rate and computational 
complexity. We declare an algorithm to be successful in 
recovery of the solution if SNR rec is greater than or equal to 60 
dB. Consequently, the success rate of an algorithm denotes the 


Fig. 5. Phase transition plots for the ICRA algorithm in solving the ARM 
problem as it proceeds with finer approximations of the rank function, (a) 
corresponds to the NNM which is used to initialize ICRA. and (b)-(d) 
correspond to the next three consecutive iterations of the external loop. Other 
conditions are as in Figure [4] 

number of times it successfully recovered the solution divided 
by the total number of trials, which is equal to 100 herein. 
Furthermore, the number of SDPs each algorithm, except SRF, 
needs to converge to a solution is reported as a measure of 
complexity. Although a rough estimate of complexity, this 
measure is independent of simulation hardware specifications 
and can give insight to the order of computational loads of 
the algorithms, as order of computation is fully understood 
for SDP solvers, see e.g. ]42) . We exclude SRF from this 
complexity comparison because it has an efficient implementa¬ 
tion, whereas ICRA is realized by CVX as a proof-of-concept 
version. In addition, other competitors are also implementable 
by SDP, while SRF is not. 

No stopping rule is specified in p3) for the LGD method, 
and we use the distance between two consecutive iterations to 
terminate it. To be precise, if d = ||Xj — Xi_i||ir/||Xj_i||i? < 
tol, where X, is the solution at the /th iteration, then the final 
solution is X, ; . In all the comparisons, tol is set to 10 -4 since 
we observed empirically that decreasing tol to smaller values 
only increases the number of LGD iterations, whereas SNR rec 
does not boost meaningfully. The SRF algorithm is executed 
with c = 0.85, p = 1, L = 8, and e = 10~ 5 . 

Figure |6(a)||6(c)| plots the success rate for ICRA, SRF, LGD, 
and NNM as well as number of SDP iterations for ICRA and 
LGD as a function of m/d r in solving MC problems with 
r = 2, 5, and 10, respectively. In these plots, the left-hand 
side vertical axis shows the average number of SDPs used to 
obtain the final solution, and the right-hand side vertical axis 
displays the success rate. Furthermore, a solid trace depicts the 
success rate of an algorithm, while the same color dashed trace 
shows the number of SDP iterations of the same algorithm. For 
instance, the black solid trace shows the success rate for the 
ICRA algorithm, and the dashed black one displays its total 
number of iterations. NNM method always gives a solution 
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Fig. 6. Comparison of the ICRA algorithm to the SRF 0. LGD |23| , and NNM methods in terms of success rate and complexity. In these plots, the 
left-hand side vertical axis denotes the number of iterations each algorithm, except for SRF and NNM, needs to converge. In addition, the right-hand side 
vertical axis display the so-called success rate. A solid trace represents the success rate of an algorithm, and a dashed trace shows the number of SDPs the 
same algorithm used to find the solutions. Trials are repeated 100 times, and results are averaged over them. 
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after execution of an SDP, so, to have more organized plots, 
this result is not shown. 

It is clear from these results that, for the MC problems, 
ICRA can recover the solutions with considerably smaller 
number of measurements, and SRF stands in the second place 
of this comparison. Particularly, when r equals to 10 with 
number of measurements less than 1.2 times of the matrix 
degrees of the freedom, solutions can be recovered by ICRA 
with a recovery rate close to 1. So far as the complexity of 
ICRA is concerned, while average number of iterations can 
exceed 17, when m increases toward values in which success 
rate is about 1, number of iterations continuously decreases 
and becomes equal to 2 when LGD starts to recover solutions 
with success rate of 100%. Also, when LGD starts to recover 
the solutions, its number of iterations suddenly increases up 
to 21 for r = 2, whereas 5 iterations in average suffice for 
ICRA to converge. 

The strength of ICRA in ARM is also shown in Figure [6(dj} 
6(h)| Under the same conditions as explained before, Qis 
solved for r = 2, 5,10,15, and 20. To sum up the results, LGD 
and NNM have very close success rate in all simulations, and 
ICRA consistently outperforms both of them. As r increases, 
the minimum m/d r in which ICRA can perfectly recover 
solutions decreases and, in particular, it needs measurements 
just 5% more than the solution degrees of freedom to recover 
with rate 1 when r is equal to 20. Similar to the MC case, the 
average number of ICRA iterations is a declining function 
of m and decreases to 2 when NNM and LGD starts to 
recover the solutions. In fact, since ICRA is initialized with 
the minimum nuclear-norm solution, when the global solution 
is attainable by nuclear norm minimization, ICRA maintains 
this solution and terminates after two iterations. This may be 
justified as follows. From Theorem [2] we expect that if <0 
and ([T} share the same global solution, ([6} also share the same 
minimizer. Moreover, Theorem [3] guarantees the convergence 
since ICRA is initialized by the global solution and the cost 
function does not increase at any iteration. 

These experiments demonstrate that even though our perfor¬ 
mance analysis predicts that, in comparison to NNM, ICRA 
requires less or equal number of measurements to uniquely 
recover the solutions, strictly smaller number of measurements 
suffice for its success. Furthermore, it seems that the proposed 
approach for minimizing can find a global minimum in 
a wide range of m’s at the presented numerical examples. 

VI. Conclusion 

The problem of approximation of rank(X) in ARM and 
MC settings was considered by formulating it as rank(X) = 
SiLi u (o"i(X)). To simplify this task, we focused on the 
approximation of the unit step function and proposed a class 
of subadditive functions which are closely match the unit 
step. The concavity and differentiability of the resulting matrix 
functions were characterized, proving that they are concave 
and differentiable for PSD matrices. Using a lemma from 
(23), we generalized the concave approximation to arbitrary 
nonsquare matrices. To handle the nonconvexity of the opti¬ 
mization problem, we used a series of optimizations, where 


the quality of the approximation is successively increased. 
Furthermore, to theoretically support the proposed algorithm, 
we presented a theorem proving the superiority of the pro¬ 
posed approximation to NNM. Then we examined the per¬ 
formance of the ICRA algorithm via numerical examples in 
both ARM and MC problems. These examples showed that 
though the computational complexity is high in comparison to 
NNM, LGD, and SRF, ICRA can recover low-rank matrices 
with number of measurements close to the intrinsic unique 
representation lower-bound. The decrease in the number of 
measurements, in comparison to NNM, was up to 50% in the 
performed numerical simulations. 
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